preserve251a05c410ac51a8
This document describes the data created for APRA’s fundraising data science online learning courses and workshops. All of the data created for these purposes is fictitious.
There are three data sets available as of 2020-08-12:
Each of these tables and the variables contained within each are described below. There are tabs included throughout the document that can be used to explore the variables included in each data set.
These data sets are designed to mirror realistic fundraising data and are not intended to be perfectly “clean” data. There are common fundraising data challenges built into the data files. For example, you can click on the Biographical Data tab above to learn more about that data set.
All of the code for this project is available on GitHub. The code that generates the data sets can be found in the generate_data.R r script.
The individual datasets can be read into R directly from github as follows.
# load the tidyverse library
library(tidyverse)
library(knitr)
# read bio data csv into R and store in a data frame named bio
bio <- read_csv("https://raw.githubusercontent.com/majerus/apra_data_science_courses/master/bio_data_table.csv")
bio %>%
sample_n(10) %>%
select(id, name, birthday, city, state, capacity, capacity_source) %>%
kable()
| id | name | birthday | city | state | capacity | capacity_source |
|---|---|---|---|---|---|---|
| 5399183 | al-Sabet, Haniyya | 1966-12-15 | Birmingham | AL | $75k - $100k | screening |
| 4999665 | Oakley, Kevin | 1959-12-20 | Fox island | WA | $75k - $100k | screening |
| 7977859 | Clayton, Marisa | 1957-12-22 | Chicago | IL | $10k - $25k | screening |
| 2845485 | Minjarez, Brandon | 1956-06-30 | Detroit | MI | $50k - $75K | screening |
| 3103314 | Vaz, Surafale | 1937-11-13 | Sun city west | AZ | $5k - $10k | screening |
| 3720048 | Nguyen, Matthew | 1980-06-05 | Gainesville | GA | $25k - $50k | NA |
| 2168995 | el-Ishak, Saleet | 1969-09-02 | Steubenville | OH | $100k - $250k | screening |
| 6188537 | Burchfield, Vincent | 1992-04-04 | Iselin | NJ | NA | institutional |
| 5247695 | el-Mansouri, Maazin | 1966-10-11 | Hope | AR | NA | screening |
| 6950846 | Lai, Kayla | NA | Flint | MI | $50k - $75K | institutional |
The biographical data has 14 variables and 100,000 observations. The data is stored at the donor level. Each row of the data represents a unique donor and biographical information about that donor.
There are 4 numeric variables:
## Rows: 100,000
## Columns: 4
## $ id <dbl> 9671621, 6098249, 7804098, 2065649, 2290208, 2566581, 88…
## $ household_id <dbl> 1000202, 1000504, 1000843, 1000843, 1000856, 1000856, 10…
## $ lat <dbl> 45.49, 38.03, 42.23, 38.60, 29.43, 36.14, 33.98, 42.11, …
## $ lon <dbl> -122.72, -78.48, -91.19, -89.68, -95.24, -115.27, -118.0…
When loaded by default there are 9 character variables:
## Rows: 100,000
## Columns: 9
## $ name <chr> "Tran, Ntxuam", "Probeck, William", "Buchanan-Sam, Ki…
## $ country <chr> "United States", "United States", "United States", "U…
## $ city <chr> "Portland", "Charlottesville", "Monticello", "Trenton…
## $ deceased <chr> "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y"…
## $ zip <chr> "97221", "22903", "52310", "62293", "77511", "89117",…
## $ state <chr> "OR", "VA", "IA", "IL", "TX", "NV", "CA", "IL", "NM",…
## $ capacity <chr> "$75k - $100k", "$25k - $50k", "$100k - $250k", "$25k…
## $ capacity_source <chr> "screening", "institutional", "screening", "screening…
## $ race <chr> "Native Americans or Alska Natives", "Asian", "Non-Hi…
There is 1 date variable:
## Rows: 100,000
## Columns: 1
## $ birthday <date> 1924-11-18, 1922-05-11, NA, 1925-05-11, 1922-01-23, 1923-05…
preservebd212e2124ad3bb4
The giving data has 6 variables and 378,001 observations. The data is stored at the gift level. Each row of the data represents a unique gift and attributes associated with that gift.
There are 4 numeric variables:
## Rows: 378,001
## Columns: 4
## $ household_id <dbl> 9420483, 6312023, 6312023, 2669409, 2669409, 5199241, 51…
## $ id <dbl> 8713532, 4279585, 8942151, 6180247, 8906224, 1131709, 41…
## $ gift_id <dbl> 2912360, 2912405, 2912405, 2912436, 2912436, 2912487, 29…
## $ gift_amt <dbl> 405, 1516, 721, 224, 457, 1217, 492, 286, 1962, 6653, 12…
preservee0d86b598e3507a7
When loaded by default there is 1 character variable:
## Rows: 100,000
## Columns: 9
## $ name <chr> "Tran, Ntxuam", "Probeck, William", "Buchanan-Sam, Ki…
## $ country <chr> "United States", "United States", "United States", "U…
## $ city <chr> "Portland", "Charlottesville", "Monticello", "Trenton…
## $ deceased <chr> "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y"…
## $ zip <chr> "97221", "22903", "52310", "62293", "77511", "89117",…
## $ state <chr> "OR", "VA", "IA", "IL", "TX", "NV", "CA", "IL", "NM",…
## $ capacity <chr> "$75k - $100k", "$25k - $50k", "$100k - $250k", "$25k…
## $ capacity_source <chr> "screening", "institutional", "screening", "screening…
## $ race <chr> "Native Americans or Alska Natives", "Asian", "Non-Hi…
preservedb1395a2b670dd6bpreserve3afff80e098d7424
There is 1 date variable:
## Rows: 100,000
## Columns: 1
## $ birthday <date> 1924-11-18, 1922-05-11, NA, 1925-05-11, 1922-01-23, 1923-05…
preservea0fa4e91a6fb2ada
The engagement data has 8 variables and 100,000 observations. The data is stored at the donor level. Each row of the data represents a unique donor and attributes associated with that donor.
There are 4 numeric variables:
## Rows: 100,000
## Columns: 4
## $ id <dbl> 9671621, 6098249, 3543434, 7372006, 3899439, 691544…
## $ numer_of_contacts <dbl> 0, 0, NA, 0, NA, 0, NA, NA, 0, 0, NA, 0, 2, NA, 0, …
## $ volunteer <dbl> 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, NA,…
## $ time_on_site <dbl> 65, NA, 362, NA, NA, NA, NA, NA, NA, 910, NA, 175, …
There are 3 character variable:
## Rows: 100,000
## Columns: 3
## $ gift_officer <chr> NA, NA, NA, NA, NA, NA, "Banks, Kevin", NA, NA, NA, NA, …
## $ event <chr> "N", "N", "Y", "N", "N", "N", "N", "Y", NA, "N", "N", "Y…
## $ interests <chr> "fashion,hunting/fishing,skiing,sports", "cars,boating/s…
There is 1 date variable:
## Rows: 100,000
## Columns: 1
## $ birthday <date> 1924-11-18, 1922-05-11, NA, 1925-05-11, 1922-01-23, 1923-05…
preservebe4285c985daffb0